8 - Deep Learning - Feedforward Networks Part 3 [ID:16758]

50 von 147 angezeigt

Hi, welcome everybody to deep learning. Thanks for tuning in. Today's topic will be the back

propagation algorithm. So you may be interested in how we actually compute these derivatives

in complex neural networks. So let's look at a simple example and this simple example here is

that we want to evaluate the following function. So our true function is 2x1 plus 3x2 to the power

of 2 plus 3. And now we want to evaluate the partial derivative of f at the position 1, 3

with respect to x1. And there's two algorithms that can do that quite efficiently and the first one

will be finite differences, the second one is an analytic difference. So we will go through both

examples here. Now for finite differences the idea is that you compute the function value at

some position x and then you add a very small increment h to x and you also compute the original

function position f of x and you take the difference between the two and then divide by the

same value of h. So this is actually the definition of a derivative. So it's the limit between f at x

plus h and f of x divided over h and we let h approach zero. Now the problem is this is not

symmetric so sometimes we want to prefer a symmetric definition of this. So we instead of computing

this exactly at x and x plus h we go half an h back and half an h to the front. So this allows us

to compute exactly at the position x and then we still have to divide over h. So this would be a

symmetric definition. Now if you do that we can do it in our example. So let's try to evaluate this

and we take our original definition 2x1 plus 3x2 to the power of 2 plus 3. We wanted to look at the

position 1, 3 and let's just calculate this. Let's use the plus half h definition up above here. So

we set h to a small value. Let's say a small value is 2 times 10 to the power of minus 2

and we plug it in. You can see that here in this row so this is going to be 2 times 1 plus half of

our h plus 9 to the power of 2 plus 3 and of course we also have to subtract our small value

in the second term and we divide by the small value as well. So this then lets us compute the

following numbers. So we will end up with approximately 124.4404 minus 123.5604 and this

will be approximately 43.999. So we can compute for any function even if we don't know the

definition of the function. If we only have it as a module that we cannot access we can use finite

differences to approximate the partial derivative. In practical use we use h in the range of

1 times 10 to the minus 5 which is appropriate for floating point position. Actually depending

on the precision of your compute system you can also determine what the appropriate value for h

is going to be. You can check that in reference number 7. We see that this is really easy to use.

We can evaluate this on any function. We don't need to know the formula definition

but of course it's computationally very inefficient. Imagine you want to determine

the gradient that is the set of all partial derivatives of a function that has a dimension

of 100. This means that you have to evaluate the function 101 times in order to compute this entire

gradient. So this may not be such a great choice for general optimization because it may become

very inefficient but of course it's a very very cool method to check your implementation.

Imagine you implemented the analytic version and sometimes you make mistakes then you can use this

as a trick to check whether your analytic derivative is correctly implemented.

This is also something you will learn in the exercises here. Really really useful if you

want to evaluate those functions. Now the analytic gradient we can defer by using a set of analytic

differentiation rules. So first rule is going to be the derivative of a constant is going to be 0.

Then our operator is a linear operator which means we can rearrange it. If you have for example

sums of different components then we also know the derivatives of monomials. If you have some

x to the power of n then the derivative is going to be n times x to the power of n minus 1.

And the chain rule so if you have nested functions and you see the chain rule is essentially

the important thing that we also need for the backpropagation algorithm then you see that the

derivative with respect to x of some nested function is going to be the derivative of the

function with respect to g and the time multiplied with the derivative of the function g by the

value of x. Okay so let's place those to the very top right here. We will need them on the

next couple of slides and let's try to calculate this. So here you see that partial derivative

with respect to x1 of f of 1 3. Then we can just plug in the definitions so this is going to be

Teil einer Videoserie :

Deep Learning - Plain Version

Presenters

Prof. Dr.-Ing. Andreas Maier

Zugänglich über

Offener Zugang

Dauer

00:18:37 Min

Aufnahmedatum

2020-05-28

Hochgeladen am

2020-05-28 12:06:35

Sprache

en-US

Deep Learning - Feedforward Networks Part 3

This video introduces the basics of the backpropagation algorithm.

Video References:
Tacoma Narrows Bridge Collapse
Credit: Stillman Fires Collection; Tacoma Fire Department (Video) - Castle Films (Sound) 1940 - This content is licensed under Creative Commons. Please visit here for more information about the license(s).

Further Reading:
A gentle Introduction to Deep Learning

Tags

Per RSS abonnieren